Cross-Lingual Sentiment Analysis with Machine Translation

نویسندگان

Erkin Demirtas

Mykola Pechenizkiy

چکیده

Recent advancements in machine translation foster an interest of its use in sentiment analysis. This thesis investigates prospects and limitations of using machine translation in cross-lingual sentiment analysis. To perform a sentiment analysis we need to learn linguistic features by either using tools such as part-of-speech taggers, parsers, or basic resources such as annotated corpora or sentiment lexica. We are motivated to study the translation of existing resources in English simply because building such tools and resources for each language requires considerable human effort. This severely limits the implementation of language specific sentiment analysis techniques similar to those developed for English. Labeled corpora and sentiment lexica are two main resources in the application of sentiment analysis. We translate them to a language with limited resources where we opt to focus on improving classification accuracy when (labeled or raw) training instances are available. In some cases, however, we may not have access to any training data. To address this scenario we explore methods to translate sentiment lexica to a target language as we also try to improve machine translation performance by generating additional context. For all experiments we work on English and Turkish data which consist of movie and product reviews and we perform two-class (positive-negative) classification -polarity detection in which we discard the neutral class. Consequently, we obtain promising results in polarity detection experiments where we use general-purpose classifiers trained on translated corpora while in this point we remark that dissimilarities between two corpora in different languages should be further studied for better integration of resources. We also find quantitative evidences to suggest that lexica translation is more troublesome since the inherit differences of expressing sentiment between two languages make it harder to preserve the sentiment of words/phrases when translating them from one language to another.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cross-Lingual Sentiment Analysis Without (Good) Translation

Current approaches to cross-lingual sentiment analysis try to leverage the wealth of labeled English data using bilingual lexicons, bilingual vector space embeddings, or machine translation systems. Here we show that it is possible to use a single linear transformation, with as few as 2000 word pairs, to capture fine-grained sentiment relationships between words in a cross-lingual setting. We a...

متن کامل

Co-Training for Cross-Lingual Sentiment Classification

The lack of Chinese sentiment corpora limits the research progress on Chinese sentiment classification. However, there are many freely available English sentiment corpora on the Web. This paper focuses on the problem of cross-lingual sentiment classification, which leverages an available English corpus for Chinese sentiment classification by using the English corpus as training data. Machine tr...

متن کامل

Cross-lingual sentiment classification: Similarity discovery plus training data adjustment

The performance of cross-lingual sentiment classification is sharply limited by the language gap, which means that each language has its own ways to express sentiments. Many methods have been designed to transmit sentiment information across languages by making use of machine translation, parallel corpora, auxiliary unlabeled samples and other resources. In this paper, a new approach is propose...

متن کامل

Exploring Distributional Representations and Machine Translation for Aspect-based Cross-lingual Sentiment Classification

Cross-lingual sentiment classification (CLSC) seeks to use resources from a source language in order to detect sentiment and classify text in a target language. Almost all research into CLSC has been carried out at sentence and document level, although this level of granularity is often less useful. This paper explores methods for performing aspect-based cross-lingual sentiment classification (...

متن کامل

Combination of Multi-view Multi-source Language Classifiers for Cross-Lingual Sentiment Classification

Cross-lingual sentiment classification aims to conduct sentiment classification in a target language using labeled sentiment data in a source language. Most existing research works rely on machine translation to directly project information from one language to another. But cross-lingual classifiers always cannot learn all characteristics of target language data by using only translated data fr...

متن کامل

The Haves and the Have-Nots: Leveraging Unlabelled Corpora for Sentiment Analysis

Expensive feature engineering based on WordNet senses has been shown to be useful for document level sentiment classification. A plausible reason for such a performance improvement is the reduction in data sparsity. However, such a reduction could be achieved with a lesser effort through the means of syntagma based word clustering. In this paper, the problem of data sparsity in sentiment analys...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2013

Cross-Lingual Sentiment Analysis with Machine Translation

نویسندگان

چکیده

منابع مشابه

Cross-Lingual Sentiment Analysis Without (Good) Translation

Co-Training for Cross-Lingual Sentiment Classification

Cross-lingual sentiment classification: Similarity discovery plus training data adjustment

Exploring Distributional Representations and Machine Translation for Aspect-based Cross-lingual Sentiment Classification

Combination of Multi-view Multi-source Language Classifiers for Cross-Lingual Sentiment Classification

The Haves and the Have-Nots: Leveraging Unlabelled Corpora for Sentiment Analysis

عنوان ژورنال:

اشتراک گذاری